📊 Gradient Accumulation - miterion · Scour

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

dev.to·1d·

Discuss: DEV

🏎️TensorRT

Show HN: Model Training Memory Simulator

czheo.github.io·14h·

Discuss: Hacker News

🏎️TensorRT

🥇Top AI Papers of the Week

nlp.elvissaravia.com·9h

⚡ONNX Runtime

Attention Retention for Continual Learning with Vision Transformers

arxiv.org·2d

👁️Attention Optimization

25W06. Learning a language with the machine

z1nz0l1n.com·13h

Writing a ONNX Neural Network Inference Engine from Scratch in C to run image classification with MobileNetV2

flexw.github.io·5h·

Discuss: r/C_Programming

⚡ONNX Runtime

Main Content || Math ∩ Programming

jeremykun.com·1h

📉Model Quantization

Cross Entropy Derivatives, Part 6: Using gradient descent to reach the final result

dev.to·4h·

Discuss: DEV

📉Model Quantization

Quantization-Aware Distillation

ternarysearch.blogspot.com·21h·

Discuss: Hacker News

📉Model Quantization

Continual learning and the post monolith AI era

baseten.co·2d·

Discuss: Hacker News

🧩Attention Kernels

The Rise of Local Speech Recognition

oatmealapp.com·5h·

Discuss: Hacker News

🏎️TensorRT

Sequential Attention: Making AI models leaner and faster without sacrificing accuracy

research.google·4d·

Discuss: Hacker News, r/LocalLLaMA

👁️Attention Optimization

Why Files Are Not Enough as Memory for AI Agents

medium.com·13h·

Discuss: Hacker News

🧩Attention Kernels

It's not a Lottery, it's a Race: Understanding How Gradient Descent Adapts the Network's Capacity to the Task

arxiv.org·3d

🏎️TensorRT

Physics-Informed Neural Networks for Inverse PDE Problems

pub.towardsai.net·1d

🏎️TensorRT

Why do tree-based models still outperform deep learning on tabular data?

paperium.net·1d·

Discuss: DEV

⚡ONNX Runtime

Neural population geometry and optimal coding of tasks with shared latent structure

nature.com·2d

👁️Attention Optimization

**Abstract:** This paper introduces Automated Pedagogical Content Adaptation through Granular Knowledge Graph & Reinforcement Learning (GPKG-RL), a syst...

freederia.com·2d

🎓Model Distillation

Accelerate your discovery by parallelizing experiments

magellink.com·7h·

Discuss: Hacker News

🌐Distributed Computing

Is Your Machine Learning Pipeline as Efficient as it Could Be?

kdnuggets.com·2d

⚡ONNX Runtime

Loading more...